Active appearance models for photorealistic visual speech synthesis

نویسندگان

  • Wesley Mattheyses
  • Lukas Latacz
  • Werner Verhelst
چکیده

The perceived quality of a synthetic visual speech signal greatly depends on the smoothness of the presented visual articulators. This paper explains how concatenative visual speech synthesis systems can apply active appearance models to achieve a smooth and natural visual output speech. By modeling the visual speech contained in the system’s speech database, a diversification between the synthesis of the shape and the texture of the talking head is feasible. This allows the system to accurately balance between the articulation strength of the visual articulators and the signal smoothness of the visual mode in order to optimize the synthesis. To improve the synthesis quality, an automatic database normalization strategy has been designed that removes variations from the database which are not related to speech production. As was verified by a perception experiment, this normalization strategy significantly improves the perceived signal quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimized photorealistic audiovisual speech synthesis using active appearance modeling

Active appearance models can represent image information in terms of shape and texture parameters. This paper explains why this makes them highly suitable for data-based 2D audiovisual text-to-speech synthesis. We elaborate on how the differentiation between shape and texture information can be fully exploited to create appropriate unit-selection costs and to enhance the video concatenations. T...

متن کامل

Mary101: A Photorealistic Text-to-Audio-Visual Speech Syn- thesizer

Previous Work: Much of the previous work in text-to-audio-visual (TTAVS) speech synthesis [9] [2] has focused on integrating physically-based facial models with a particular speech synthesis system in order to give the impression of a ”talking face”. Some TTAVS systems have also resorted to Cyberware scanning techniques to overlay realisticlooking skin texture on top of the underlying graphics ...

متن کامل

Chromatic Adaptation Post-Filtering in Image Synthesis Reproduction of Ancient Building for Restoration Support

Within the field of cultural heritage restoration, experts are interested in the analysis of data describing the condition and history of ancient monuments. Data are usually distributed over many sites. VRML and Java technology, which are well-suited for describing geometrical models and data interaction over the Internet. Unfortunately, the poor quality of VRML real time rendering is a bottlen...

متن کامل

Visual speech synthesis using statistical models of shape and appearance

In this paper we present preliminary results of work towards a video-realistic visual speech synthesizer based on statistical models of shape and appearance. A sequence of images corresponding to an utterance is formed by concatenation of synthesis units (in this case triphones) from a pre-recorded inventory. Initial work has concentrated on a compact representation of human faces, accommodatin...

متن کامل

Photo-realistic visual speech synthesis based on AAM features and an articulatory DBN model with constrained asynchrony

This paper presents a photo realistic visual speech synthesis method based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN) in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual linear prediction (PLP) features from the audio speech and active appearance model (AAM) features from mouth ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010